369 research outputs found

    Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models

    Get PDF
    Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a “corrected” empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of sequence alignments, our estimators show a significant improvement in goodness of fit compared to the approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the -style estimators

    Selective Constraints on Amino Acids Estimated by a Mechanistic Codon Substitution Model with Multiple Nucleotide Changes

    Get PDF
    Empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution. We develop a codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene. First, selective constraints averaged over proteins are estimated by maximizing the likelihood of each 1-PAM matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution matrices. Then, selective constraints specific to given proteins are approximated as a linear function of those estimated from the empirical substitution matrices. Akaike information criterion (AIC) values indicate that a model allowing multiple nucleotide changes fits the empirical substitution matrices significantly better. Also, the ML estimates of transition-transversion bias obtained from these empirical matrices are not so large as previously estimated. The selective constraints are characteristic of proteins rather than species. However, their relative strengths among amino acid pairs can be approximated not to depend very much on protein families but amino acid pairs, because the present model, in which selective constraints are approximated to be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can provide a good fit to other empirical substitution matrices including cpREV for chloroplast proteins and mtREV for vertebrate mitochondrial proteins. The present codon-based model with the ML estimates of selective constraints and with adjustable mutation rates of nucleotide would be useful as a simple substitution model in ML and Bayesian inferences of molecular phylogenetic trees, and enables us to obtain biologically meaningful information at both nucleotide and amino acid levels from codon and protein sequences.Comment: Table 9 in this article includes corrections for errata in the Table 9 published in 10.1371/journal.pone.0017244. Supporting information is attached at the end of the article, and a computer-readable dataset of the ML estimates of selective constraints is available from 10.1371/journal.pone.001724

    Episodic Evolution and Adaptation of Chloroplast Genomes in Ancestral Grasses

    Get PDF
    It has been suggested that the chloroplast genomes of the grass family, Poaceae, have undergone an elevated evolutionary rate compared to most other angiosperms, yet the details of this phenomenon have remained obscure. To know how the rate change occurred during evolution, estimation of the time-scale with reliable calibrations is needed. The recent finding of 65 Ma grass phytoliths in Cretaceous dinosaur coprolites places the diversification of the grasses to the Cretaceous period, and provides a reliable calibration in studying the tempo and mode of grass chloroplast evolution.By using chloroplast genome data from angiosperms and by taking account of new paleontological evidence, we now show that episodic rate acceleration both in terms of non-synonymous and synonymous substitutions occurred in the common ancestral branch of the core Poaceae (a group formed by rice, wheat, maize, and their allies) accompanied by adaptive evolution in several chloroplast proteins, while the rate reverted to the slow rate typical of most monocot species in the terminal branches.Our finding of episodic rate acceleration in the ancestral grasses accompanied by adaptive molecular evolution has a profound bearing on the evolution of grasses, which form a highly successful group of plants. The widely used model for estimating divergence times was based on the assumption of correlated rates between ancestral and descendant lineages. However, the assumption is proved to be inadequate in approximating the episodic rate acceleration in the ancestral grasses, and the assumption of independent rates is more appropriate. This finding has implications for studies of molecular evolutionary rates and time-scale of evolution in other groups of organisms

    Identification of physicochemical selective pressure on protein encoding nucleotide sequences

    Get PDF
    BACKGROUND: Statistical methods for identifying positively selected sites in protein coding regions are one of the most commonly used tools in evolutionary bioinformatics. However, they have been limited by not taking the physiochemical properties of amino acids into account. RESULTS: We develop a new codon-based likelihood model for detecting site-specific selection pressures acting on specific physicochemical properties. Nonsynonymous substitutions are divided into substitutions that differ with respect to the physicochemical properties of interest, and those that do not. The substitution rates of these two types of changes, relative to the synonymous substitution rate, are then described by two parameters, γ and ω respectively. The new model allows us to perform likelihood ratio tests for positive selection acting on specific physicochemical properties of interest. The new method is first used to analyze simulated data and is shown to have good power and accuracy in detecting physicochemical selective pressure. We then re-analyze data from the class-I alleles of the human Major Histocompatibility Complex (MHC) and from the abalone sperm lysine. CONCLUSION: Our new method allows a more flexible framework to identify selection pressure on particular physicochemical properties

    Ancient DNA Elucidates the Controversy about the Flightless Island Hens (Gallinula sp.) of Tristan da Cunha

    Get PDF
    A persistent controversy surrounds the flightless island hen of Tristan da Cunha, Gallinula nesiotis. Some believe that it became extinct by the end of the 19th century. Others suppose that it still inhabits Tristan. There is no consensus about Gallinula comeri, the name introduced for the flightless moorhen from the nearby island of Gough. On the basis of DNA sequencing of both recently collected and historical material, we conclude that G. nesiotis and G. comeri are different taxa, that G. nesiotis indeed became extinct, and that G. comeri now inhabits both islands. This study confirms that among gallinules seemingly radical adaptations (such as the loss of flight) can readily evolve in parallel on different islands, while conspicuous changes in other morphological characters fail to occur

    Implications of the Plastid Genome Sequence of Typha (Typhaceae, Poales) for Understanding Genome Evolution in Poaceae

    Get PDF
    Plastid genomes of the grasses (Poaceae) are unusual in their organization and rates of sequence evolution. There has been a recent surge in the availability of grass plastid genome sequences, but a comprehensive comparative analysis of genome evolution has not been performed that includes any related families in the Poales. We report on the plastid genome of Typha latifolia, the first non-grass Poales sequenced to date, and we present comparisons of genome organization and sequence evolution within Poales. Our results confirm that grass plastid genomes exhibit acceleration in both genomic rearrangements and nucleotide substitutions. Poaceae have multiple structural rearrangements, including three inversions, three genes losses (accD, ycf1, ycf2), intron losses in two genes (clpP, rpoC1), and expansion of the inverted repeat (IR) into both large and small single-copy regions. These rearrangements are restricted to the Poaceae, and IR expansion into the small single-copy region correlates with the phylogeny of the family. Comparisons of 73 protein-coding genes for 47 angiosperms including nine Poaceae genera confirm that the branch leading to Poaceae has significantly accelerated rates of change relative to other monocots and angiosperms. Furthermore, rates of sequence evolution within grasses are lower, indicating a deceleration during diversification of the family. Overall there is a strong correlation between accelerated rates of genomic rearrangements and nucleotide substitutions in Poaceae, a phenomenon that has been noted recently throughout angiosperms. The cause of the correlation is unknown, but faulty DNA repair has been suggested in other systems including bacterial and animal mitochondrial genomes

    Whole-Gene Positive Selection, Elevated Synonymous Substitution Rates, Duplication, and Indel Evolution of the Chloroplast clpP1 Gene

    Get PDF
    Synonymous DNA substitution rates in the plant chloroplast genome are generally relatively slow and lineage dependent. Non-synonymous rates are usually even slower due to purifying selection acting on the genes. Positive selection is expected to speed up non-synonymous substitution rates, whereas synonymous rates are expected to be unaffected. Until recently, positive selection has seldom been observed in chloroplast genes, and large-scale structural rearrangements leading to gene duplications are hitherto supposed to be rare. genes experiencing negative (purifying) selection are characterized by having very conserved lengths, genes under positive selection often have large insertions of more or less repetitive amino acid sequence motifs. gene and surrounding regions, repetitive amino acid sequences, and increase in synonymous substitution rates. The present study sheds light on the controversial issue of whether negative or positive selection is to be expected after gene duplications by providing evidence for the latter alternative. The observed increase in synonymous substitution rates in some of the lineages indicates that the detection of positive selection may be obscured under such circumstances. Future studies are required to explore the functional significance of the large inserted repeated amino acid motifs, as well as the possibility that synonymous substitution rates may be affected by positive selection

    CodonTest: Modeling Amino Acid Substitution Preferences in Coding Sequences

    Get PDF
    Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes

    Viral Evolution and Cytotoxic T Cell Restricted Selection in Acute Infant HIV-1 Infection

    Get PDF
    Antiretroviral therapy-naive HIV-1 infected infants experience poor viral containment and rapid disease progression compared to adults. Viral factors (e.g. transmitted cytotoxic T- lymphocyte (CTL) escape mutations) or infant factors (e.g. reduced CTL functional capacity) may explain this observation. We assessed CTL functionality by analysing selection in CTL-targeted HIV-1 epitopes following perinatal infection. HIV-1 gag, pol and nef sequences were generated from a historical repository of longitudinal specimens from 19 vertically infected infants. Evolutionary rate and selection were estimated for each gene and in CTL-restricted and non-restricted epitopes. Evolutionary rate was higher in nef and gag vs. pol, and lower in infants with non-severe immunosuppression vs. severe immunosuppression across gag and nef. Selection pressure was stronger in infants with non-severe immunosuppression vs. severe immunosuppression across gag. The analysis also showed that infants with non-severe immunosuppression had stronger selection in CTL-restricted vs. non-restricted epitopes in gag and nef. Evidence of stronger CTL selection was absent in infants with severe immunosuppression. These data indicate that infant CTLs can exert selection pressure on gag and nef epitopes in early infection and that stronger selection across CTL epitopes is associated with favourable clinical outcomes. These results have implications for the development of paediatric HIV-1 vaccines
    corecore